Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction
نویسندگان
چکیده
BACKGROUND We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024-1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581-590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. RESULTS Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6 degrees C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, epsilon of 0.21) and an RMSE of 45.1 degrees C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3 degrees C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5 degrees C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. CONCLUSION With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors.
منابع مشابه
Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملSolving the Vehicle Routing Problem with Simultaneous Pickup and Delivery by an Effective Ant Colony Optimization
One of the most important extensions of the capacitated vehicle routing problem (CVRP) is the vehicle routing problem with simultaneous pickup and delivery (VRPSPD) where customers require simultaneous delivery and pick-up service. In this paper, we propose an effective ant colony optimization (EACO) which includes insert, swap and 2-Opt moves for solving VRPSPD that is different with common an...
متن کاملA Comparative Study of the Prediction Stock Crash Risk by using Meta- Heuristic & Regression
One of the most important methods of opacity accounting information by management is to accelerate the identification of good news versus delaying the identification of bad news on profits, but there is always a final level of accumulation of bad news in the company, and by reaching that its final level, these bad news will be released, which will lead to a Stock Price Crash Risk. In fact, stoc...
متن کاملآموزش شبکه عصبی مصنوعی با نسخه آشوبگونه الگوریتم جستجوی گرانشی و کاربرد آن در پیشبینی آلایندههای هوا: مطالعه قیاسی
Prediction of urban air pollution is an important subject in environmental studies. However, the required data for prediction is not available for every interested location. So, different models have been proposed for air pollution prediction. The feature selection (among 20 features given in Meteorology Organization data) was performed by binary gravitational search algorithm (BGSA) in this st...
متن کاملAn Adaptive Fuzzy Ant Colony Optimization for Feature Selection
Standard pattern classifiers perform on all data features. Whereas, some of the features are redundant or irrelevant, which reduce prediction accuracy, and increase running time of classifier. The purpose of this study is to search optimal feature subset, in order to increase the classification performance. The feature selection problem differs from traditional optimization problem on the probl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Chemistry Central Journal
دوره 2 شماره
صفحات -
تاریخ انتشار 2008